slimTrain---A Stochastic Approximation Method for Training Separable Deep Neural Networks
نویسندگان
چکیده
Deep neural networks (DNNs) have shown their success as high-dimensional function approximators in many applications; however, training DNNs can be challenging general. DNN is commonly phrased a stochastic optimization problem whose challenges include nonconvexity, nonsmoothness, insufficient regularization, and complicated data distributions. Hence, the performance of on given task depends crucially tuning hyperparameters, especially learning rates regularization parameters. In absence theoretical guidelines or prior experience similar tasks, this requires solving series repeated problems which time-consuming demanding computational resources. This limit applicability to with nonstandard, complex, scarce datasets, e.g., those arising scientific applications. To remedy training, we propose \tt slimTrain, method for reduced sensitivity choice hyperparameters fast initial convergence. The central idea slimTrain exploit separability inherent architectures; that is, separate into nonlinear feature extractor followed by linear model. allows us leverage recent advances made large-scale, linear, ill-posed inverse problems. Crucially, weights, does not require rate automatically adapts parameter. our numerical experiments using approximation tasks surrogate modeling dimensionality reduction, outperforms existing methods recommended hyperparameter settings reduces remaining hyperparameters. Since operates mini-batches, its overhead per iteration modest savings realized reducing number iterations (due quicker convergence) need solved identify effective
منابع مشابه
Provable approximation properties for deep neural networks
We discuss approximation of functions using deep neural nets. Given a function f on a d-dimensional manifold Γ ⊂ R, we construct a sparsely-connected depth-4 neural network and bound its error in approximating f . The size of the network depends on dimension and curvature of the manifold Γ, the complexity of f , in terms of its wavelet description, and only weakly on the ambient dimension m. Es...
متن کاملWhy Deep Neural Networks for Function Approximation?
Recently there has been much interest in understanding why deep neural networks are preferred to shallow networks. We show that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neurons needed by a deep network for a given degree of function approximation. First, ...
متن کاملAdaptive dropout for training deep neural networks
Recently, it was shown that deep neural networks can perform very well if the activities of hidden units are regularized during learning, e.g, by randomly dropping out 50% of their activities. We describe a method called ‘standout’ in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero. This ‘adapt...
متن کاملExploring Strategies for Training Deep Neural Networks
Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions. Hinton et al. recently proposed a greedy layer-wise u...
متن کاملA conjugate gradient based method for Decision Neural Network training
Decision Neural Network is a new approach for solving multi-objective decision-making problems based on artificial neural networks. Using inaccurate evaluation data, network training has improved and the number of educational data sets has decreased. The available training method is based on the gradient decent method (BP). One of its limitations is related to its convergence speed. Therefore,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: SIAM Journal on Scientific Computing
سال: 2022
ISSN: ['1095-7197', '1064-8275']
DOI: https://doi.org/10.1137/21m1452512